{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Binary data transfer\n",
    "====================\n",
    "\n",
    "### Motivation\n",
    "\n",
    "Often for visualizations in genomics, massive social networks, or sensor data visualizations,\n",
    "it helps to be able to plot millions rather than simply hundreds of thousands of points.\n",
    "\n",
    "By default, pydeck sends data from Jupyter to the frontend by serializing data to JSON. However, for massive data sets,\n",
    "the costs to serialize and deserialize this JSON can prevent a visualization from rendering.\n",
    "\n",
    "In order to get around this, pydeck supports binary data transfer, which significantly reduces data size. Binary transfer relies\n",
    "on [NumPy](https://numpy.org/) and its [typed arrays](https://numpy.org/devdocs/user/basics.types.html),\n",
    "which are converted to [JavaScript typed arrays](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays) and passed to\n",
    "deck.gl [using precalculated binary attributes](https://deck.gl/#/documentation/developer-guide/performance-optimization?section=supply-attributes-directly).\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Usage\n",
    "\n",
    "Binary transport will only work if the following requirements are met:\n",
    "\n",
    "-  **Use Jupyter.** Binary transfer only works within Jupyter environments via `pydeck.bindings.deck.Deck.show`. It relies on the socket-level\n",
    "  communication built into the Jupyter environment.\n",
    "- **Set ``use_binary_transport``.** ``use_binary_transport`` must be set to ``True`` explictly on your ``Layer``.\n",
    "- **Pass data as a DataFrame.** The ``Layer``'s ``data`` argument must be a ``pandas.DataFrame`` object.\n",
    "- **Don't pass un-used columns.** Data that is not intend to be rendered should not be passed into the ``Layer``.\n",
    "- **Prefer columns of lists where possible**. Accessor names must be strings representing column names within the ``pandas.DataFrame``. For example, ``get_position='position'`` is correct, **not** ``get_position=['x', 'y']``.\n",
    "  For example, this data format, where ``x`` & ``y`` represent a position and ``r``, ``g``, and ``b`` represent color values, like this–\n",
    "\n",
    "| x | y |  r  |  g  | b |\n",
    "|---|:--|:----|:----|:--|\n",
    "| 0 | 1 |  0  |  0  | 0 |\n",
    "| 0 | 5 | 255 |  0  | 0 |\n",
    "| 5 | 1 | 255 | 255 | 0 |\n",
    "\n",
    "should be converted to a nested format like this–\n",
    "\n",
    "| position   | color         |\n",
    "|------------|:--------------|\n",
    "| [0, 1]     | [0, 0, 0]     |\n",
    "| [0, 5]     | [255, 0, 0]   |\n",
    "| [5, 1]     | [255, 255, 0] |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example\n",
    "\n",
    "To demonstrate this, we'll plot a series of nodes generated at random from the [networkx](https://networkx.github.io/) library:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pydeck as pdk\n",
    "import pandas as pd\n",
    "\n",
    "\n",
    "NODES_URL = \"https://raw.githubusercontent.com/ajduberstein/geo_datasets/master/social_nodes.csv\"\n",
    "\n",
    "\n",
    "def generate_graph_data(num_nodes, random_seed):\n",
    "    \"\"\"Generates a graph of 10k nodes with a 3D force layout\n",
    "\n",
    "    This function is unused but serves as an example of how the data in\n",
    "    this visualization was generated\n",
    "    \"\"\"\n",
    "    import networkx as nx  # noqa\n",
    "\n",
    "    g = nx.random_internet_as_graph(num_nodes, random_seed)\n",
    "    node_positions = nx.fruchterman_reingold_layout(g, dim=3)\n",
    "\n",
    "    force_layout_df = pd.DataFrame.from_records(node_positions).transpose()\n",
    "    force_layout_df[\"group\"] = [d[1][\"type\"] for d in g.nodes.data()]\n",
    "    force_layout_df.columns = [\"x\", \"y\", \"z\", \"group\"]\n",
    "    return force_layout_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will use deck.gl's OrbitView (example [here](https://deck.gl/gallery/point-cloud-layer)) which can help us plot these points in 3D:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The data we'll render is a 3D position with some categorical data (``group``):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nodes = pd.read_csv(NODES_URL)\n",
    "nodes.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need this flattened data in a nested format, so we transform it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nodes[\"position\"] = nodes.apply(lambda row: [row[\"x\"], row[\"y\"], row[\"z\"]], axis=1)\n",
    "\n",
    "# Remove all unused columns\n",
    "del nodes[\"x\"]\n",
    "del nodes[\"y\"]\n",
    "del nodes[\"z\"]\n",
    "nodes.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll also color the points by group and then normalize the colors since they are provided as floats–see the [deck.gl documentation](https://deck.gl/docs/developer-guide/performance#supply-attributes-directly)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "colors = pdk.data_utils.assign_random_colors(nodes[\"group\"])\n",
    "# Divide by 255 to normalize the colors\n",
    "# Specify positions and colors as columns of lists\n",
    "nodes[\"color\"] = nodes.apply(\n",
    "    lambda row: [c / 255 for c in colors.get(row[\"group\"])], axis=1\n",
    ")\n",
    "# Remove unused column\n",
    "del nodes[\"group\"]\n",
    "nodes.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "view_state = pdk.ViewState(\n",
    "    offset=[0, 0],\n",
    "    latitude=None,\n",
    "    longitude=None,\n",
    "    bearing=None,\n",
    "    pitch=None,\n",
    "    zoom=10\n",
    ")\n",
    "views = [pdk.View(type=\"OrbitView\", controller=True)]\n",
    "\n",
    "nodes_layer = pdk.Layer(\n",
    "    \"PointCloudLayer\",\n",
    "    nodes,\n",
    "    id=\"binary-points\",\n",
    "    use_binary_transport=True,\n",
    "    get_position=\"position\",\n",
    "    get_normal=[10, 100, 10],\n",
    "    get_color=\"color\",\n",
    "    pickable=True,\n",
    "    auto_highlight=True,\n",
    "    highlight_color=[255, 255, 0],\n",
    "    radius=50,\n",
    ")\n",
    "\n",
    "pdk.Deck(layers=[nodes_layer], initial_view_state=view_state, views=views, map_provider=None).show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Confirming that data is being transferred over web sockets\n",
    "\n",
    "You can open up your brower's developer console to confirm the data is being sent via binary transfer. In the console you should see a message like:\n",
    "\n",
    "    >> transport.js:68 Delivering transport message \n",
    "    >> binary: {\"binary-points\": {…}}\n",
    "    >> transport: n {name: \"Jupyter Transport (JavaScript <=> Jupyter Kernel)\", _messageQueue: Array(0), userData: {…}, jupyterModel: n, jupyterView: n}\n",
    "    type: \"json-with-binary\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should also be able to see the binary payload in your developer console's Network tab."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
